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CONTENT IN CONTEXT: THE FUTURE OF SGML AND HTML 
by Jerry Michalski 


Every time a new software technology finds a market, someone has corralled, 
structured and put to work a new information type, from accounting and pay- 
roll systems to CAD, spreadsheets and multimedia. Most text-related tools 
(e.g., text retrieval, document management and document routing) deal with 
text from the outside, as opaque objects. Word processors assist in text 
manipulation, but text has resisted efforts to make its internal structure 
explicit and automatically manipulable for a long time. The fact that val- 
uable documents are often lengthy, complex, self-referencing and encrusted 
with graphics, videos and other stuff has made the job more difficult. One 
of the most important advances in making text more useful is the Standard 
Generalized Markup Language (SGML), which we covered at length in Release 
1.0, 7-91. 


SGML brings database capabilities to raw text collections. It allows 
people to manipulate fields within large documents. They can define and 
find abstracts, subsections, captions, bulleted or numbered Lists, 
copyright notices, bibliographies and other document elements easily and 
re-use them in other contexts. People can limit text searches to specific 
parts or elements of a document, which greatly improves query precision and 
recall. SGML also smooths text exchange between organizations with dis- 
similar computer systems. 


The greatest use of SGML has been in vertical markets where certain organi- 
zations need to exchange well-specified kinds of information with each 

other, or where a dominant buyer -- say, the Department of Defense or Ford 
Motors -- requires SGML for new documentation. With SGML, the US military 
has streamlined the way it manages docu- 
mentation, from procurement to training 
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retail markets. But lately in a new guise, as part of the Internet's World 
Wide Web, it has attracted a great deal of attention from companies inter- 
ested in publishing and communications (see Release 1.0, 1-94). The Web al- 
lows people to publish "pages" of information containing mixed text, graph- 
ics and sounds that are highly linked to each other. To do so, the Web uses 
the HyperText Markup Language (HTML), a highly simplified and stylized, 
network-centric application of SGML. Many people have experimented with 
HTML, so there are some incompatible variants in use. Standards committees 
seem to be well on their way to set a new baseline for HTML (see page 8). 


Both SGML and HTML define tags embedded in documents to identify different 
document elements. Most tags are set off from regular text with angle 
brackets; ending tags start with a slash character. Thus, "<hl>Moby 
Dick</hl>" causes the interpreting application to identify the text "Moby 
Dick" as a first-level heading, which may mean it presents it in a larger 
font size than the normal text, centered and boldface, and prints it twice, 
once on the title page and again above the table of contents. 


This month: acronym co-evolution 


Eventually, HTML will probably do more than draw attention to SGML: It may 
help reframe our understanding of internal document structures and rela- 
tionships among documents, and their role in the broader information envi- 
ronment. This issue of Release 1.0 examines the co-evolution of HTML and 
SGML and some of the software products from the companies and consortia who 
are driving this process. 


This is a large subject, so there are areas that we don’t cover in this is- 
sue, such as secure transaction services, mathematical equations, seals of 

approval and rules about re-use. We may cover some of those in future is- 

sues, Our focus this month is on content in context. Here are some of our 
conclusions, listed from the practical and short-term to the more abstract 

and long-term. 


e HTML, which many people now deal with in its raw, unadorned state with 
simple text-editing tools, will disappear behind graphical interfaces 
(see page 14). In fact, the Web will become more balanced between the 
creators and users of content. Lightweight authoring and editing tools 
will replace today’s limited browsers, which will lower the barrier for 
creation of content and links. 


@ Other tools will get dragged into the mix. SGML and HTML don't provide 
enough visual control for some applications, most obviously traditional 
publishing. Soon, Adobe's Acrobat and full-fledged SGML will run on 
the Web (see page 11). Soon, page-layout, graphic-design and other 
functions will fit in. 


o HTML will be a proper SGML DTD, but will remain a maverick. For exam- 
ple, HTML doesn’t need everything to be defined and declared up front. 
The two communities will adapt to each other. 


e HTML'’s anchor tag, though limited, changes the nature of SGML (see page 


12). SGML gives content structure and local context; HTML gives it 
connectivity and global context. 
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e As people grow more comfortable linking chunks of information, they 
will create new forms of expression, structures, etc. People will 
weave their content into the larger context and share the results. 


o The co-evolution of SGML and HTML will lead to new insights about the 
intrinsic structure of documents, document collections and the broader 
realm of information. Of special note are the differences between lo- 
cal and global structure and how to make best use of them in different 
contexts. We will be better equipped to strike a balance between the 
formal and the informal, precision and fuzziness, messaging and broad- 
casting. 


The focus on structure and context balances the WYSIWYG obsession that has 
pervaded much of personal computing for the past decade. 


Design intent 


It’s helpful to explore the differences between SGML and HTML by examining 
the designers’ intents. SGML began as a way to gain efficiencies and power 
in document creation, distribution and re-use. However, its creators rapid- 
ly expanded the original scope to an abstract representational scheme that 
would allow them to manipulate and re-use components of large documents for 
various purposes across many different industries, and therefore across many 
document types. Initially, documents were self-contained: They could con- 
tain complex structures, subdocuments and many internal references, but they 
were not linked directly to other documents. 


HTML's developers sought to create a way to share information over the In- 
ternet, which they regarded as a communication medium. They used SGML as a 
starting point, since it was already mature and offered a useful framework, 
but they stripped out many of SGML’s features and added Internet-based link- 
ing capabilities. More importantly, they ignored some of the SGML communi- 
ty’s implicit assumptions. They restricted the number of features they 
would support, but made the interpreters forgiving. If a browser hits an 
unknown tag, it will ignore it instead of stopping or crashing the program. 
They made the system easily extensible and even planned for an evolutionary 
dynamic which might lead to HTML’s replacement over time, if other protocols 
prove more useful and popular. In a way, SGML was designed to constrain; 
HTML was designed to empower. 


"When you ask SGML people what their environment’s 
topology is, they say it’s a hierarchy. When you ask 
HTML people the same question, they say it’s a web.” 

-- Haviland Wright, Interleaf 


The SGML way 
In the early 1970s, Charles Goldfarb led an IBM research project to develop 


integrated law-office information systems. In the process, he, Edward 
Mosher and Raymond Lorie developed a system they called the Generalized 
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Markup Language .+ The international standards community picked up the 
idea, added the word "Standard," and SGML was on its way. In 1986, SGML 
became an international standard (ISO 8879); now it’s the hub of a growing 
industry and the foundation for millions of documents already online and 
many more on their way. 


SGML is a syntax for defining document frameworks so that organizations can 
exchange documents easily, manipulate and search them, put them to dif- 
ferent uses with shared data or simply revise and reissue them with less 
effort. SGML is nothing without Document Type Definitions (DTDs), which 
define in great (but still abstract) detail the valid and expected com- 
ponents for documents. That is, DTDs define how many levels of headings a 
chapter can have, but not their presentation (what font and style they will 
show up in). That's the job of a publishing or output-formatting package. 


DTDs are designed by their creators to identify structural elements such as 
titles, authors’ names, captions, key words, footnotes, copyrights, illus- 
trations, list items and tables. Developers can also create special tags 
such as drug names in FDA filings and cross-reference them to data sources. 
Once identified, these elements can then be parameters for queries or other 
processes. If you've tagged all the key words in a document, generating an 
index or cross-reference is easy. So is generating a table of contents to 
any arbitrary depth, if you've tagged the headings and subheads. So is ap- 
plying visual formatting. 


In the SGML flow 


To set up an SGML-based publishing system one must participate in a food 
chain of sorts. First, administrators define DTDs for their companies. 
They can do this from scratch in text editors; they can use specialized 
tools such as MicroStar’s Near & Far; or they can choose from industry- 
standard DTDs, much as they would look for an existing, specific EDI pack- 
age if they wanted to sell shoestrings to K mart, for example. 


Once DIDs are in place, people can create content that adheres to those 
definitions with SGML-aware authoring tools such as SoftQuad’s Author/ 
Editor, Datalogics’ WriterStation or Microsoft's SGML Author for Word (see 
page 14). If they don’t have such tools or are dealing with legacy docu- 
ments, they can use specific filters or more intelligent conversion pro- 
grams, such as Avalanche Development's FastTAG,“ that read files in other 
formats, find patterns, deduce structure and apply markup accordingly. 

This process isn’t always perfect, but it can perform the bulk of the work. 


Finally, validation parsers such as Exoterica’s Omnimark or those built 
into SGML authoring software confirm that documents are properly tagged, 
given a specific DTD. Once documents are in proper SGML, users can store 
them in document-management systems and use text-retrieval tools that can 
use the SGML information, such as IDI’s BasisPlus. Also, some vendors 


1 Goldfarb recently left IBM to form his own consultancy, Information Man- 
agement Consulting. 


2 Avalanche is now a subsidiary of Interleaf. 
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create environments that bundle many of these features, such as Electronic 
Book Technologies’ DynaText. 


Industry by industry 


Early SGML customers could best cost-justify systems for high-value ap- 
plications, typically within specific industries, so the SGML market quick- 
ly developed many submarkets. Some of the most important SGML initiatives 
are the US military’s Continuous Acquisition and Life-cycle Support 

(CALS) ;3 the American Association of Publishers’ Electronic Manuscript Pro- 
ject; and the International Committee for Accessible Document Design 
(IGADD) system for Braille, synthesized voice and large-print editions. 
Other notable efforts include the Pinnacles Group of semiconductor sup- 
pliers and the Text Encoding Initiative for humanities research. 


The tendency toward deep, vertical-market DTDs had a negative side. Until 
recently, SGML “goodness" was measured by how intricate and erudite DTDs 
were, even if the results were sometimes hard to use. Some SGML purists 
derided efforts that were useful but not complex or well-behaved, such as 
HTML. That attitude is changing quickly. 


"In the SGML market, as soon as someone puts out a pro- 
duct, everyone wants to measure it against CALS or ATA 
[SGML specifications]. HTML is a stark statement that 
an appropriate DTD doesn’t look anything like CALS. The 
focus in SGML is in the detailed stuff; the market op- 
portunity is in the simple stuff." 

-- Bill Zoellick, Avalanche Development 


The World Wide Web and HTML 


In 1989, Tim Berners-Lee wanted to create a way for physicists at the Euro- 
pean Laboratory for Particle Physics in Switzerland (CERN), to collaborate 
more effectively over the Internet, which they already used heavily for e- 
mail, Usenet newsgroups and ftp (the file transfer protocol). He wanted to 
create a flexible and forgiving network fabric that could evolve according 
to its users’ needs. Over the next couple of years Berners-Lee developed 
the World Wide Web, for which he created the HyperText Markup Language, the 
server-side HyperText Transfer Protocol (HITP) and Universal Resource 
Locators (URLs), which are a standard nomenclature to determine where a 
file or application is and what protocol it runs. 


Although Berners-Lee knew SGML and wanted to base the new system on it, he 
designed HTML from scratch and with the specific intent to avoid much of 


SGML’s baggage. For example, HTML doesn't require all tags or protocols to 
be pre-declared; it simply ignores things it can’t understand. Thus HTML 


3 Previously "Computer-aided Acquisition and Logistics Support." 
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is not a subset of SGML, but rather is a specific, simple instance of a DTD 
created specifically for the Web. In Internet style, Berners-Lee put his 
early prototype code out for others to use and requested that people who 
tried it send him improvements and suggestions. 


HTML and URLs aren’t difficult to master, nor are they particularly power- 
ful taken separately. The power comes from the consistency and breadth of 
use that they get. People can use the simplest of text editors and a pinch 
of imagination (plus some patience!) to create content for the Web. They 
don't need advanced degrees to do it. With a little coaching, fifth-grade 
munchkins can put their own so-called home pages on the Web, as Sun’s John 
Gage demonstrated at the 1994 PC Forum. 


Now, like unwitting, collaborating hives of mutant tapestry-weaving ants, 
people around the world are working to develop hypertext documents and 
weave them with others on the World Wide Web. The Web is the fastest- 
growing part of the fast-growing Internet. 


The acid test: Will it play in Mosaic? 


Many people use awkward combinations of tools to publish on the Web. For 
example, some people run a text editor such as vi or emacs in one window 
and Mosaic in another. After they make some changes to their HTML pages, 
they save them to disk, refresh the browser’s display and see if their 
changes worked. That, strangely enough, is the measure of a "good" HTML 
page: It’s one that will display properly on a majority of the client 
browser applications, especially Mosaic. If it doesn’t, you start over. 


Berners-Lee never intended Web authors to see -- let alone have to know -- 
HTML.’ The first version of Web software, which Berners-Lee built in 
NextStep (now OpenStep), had an integrated editor and browser. When Marc 
Andreessen and his colleagues at the National Center for Supercomputing Ap- 
plications (NCSA) wrote the Mosaic Web client, they built a read-only brow- 
ser because Andreessen felt it would be too difficult to do an integrated 
package. (Andreessen and most of that team are now at Mosaic Communica- 
tions.) HTML’s visibility is largely a result of that decision (and, of 
course, the fact that we don’t all have NeXT workstations). 


Features or bugs? 


HIML isn’t perfect. Until recently, available authoring systems were 
primitive (see page 14 for some new offerings). Publishers fear that HTML 


4 Recently, Dan Connolly of HaL Computer Systems put a server on the In- 
ternet that can validate that HTML pages conform to the HTML 2.0 specifica- 
tion (see Resources, page 19). Mail it a page of HTML, and it sends back a 
list of comments from SGMLS, a validation parser. Georgia Tech has en- 
hanced this server so you need only send a pointer (URL) to the page in 
question, not the whole document. 


5 Similarly, the developers of the ISO e-mail standard, X.400, never in- 


tended for end-users to see the ugly addresses, but X.500 took far longer 
to finish than they expected. 
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permanently limits their control over how their material appears on-screen. 
Information shows up in the client software's fonts, sizes and colors. 

Even if publishers work within HTML’s limitations, its stream-oriented 
structure limits their creativity: With current tools, text can’t flow 
around images or into multi-column layouts. 


The infrastructure is thin, but it’s mostly a sign of opportunity. 
Document- and link-management capabilities are usually limited to what the 
file system provides. Companies have only begun to create Web publishing 
systems backed with production databases. As people publish more documents 
and links proliferate, things can get messy. However, any individual’s 
view of the Web is only as messy as the quality of the sources she chooses 
to link to. Most HTML links are coarse: They point to documents, not 
specific objects within documents. Named anchors offer finer granularity, 
but their use requires write access for the link author. Often these miss- 
ing features exist in SGML but were sacrificed in order to make HTML as 
simple as possible. Most of them are addressed either in HTML 3.0 or by 
companies that see them as opportunities. 


The Web way 


In their search for more functionality, some people re-interpret HTML's 
features in unexpected ways; others experiment with their own variants and 
protocol extensions. HTML invites a host of misunderstandings because it 
neither definitively answers layout and format questions nor points to a 
definitive solution. People use the elements inconsistently; they nest 
elements within each other in ways that aren't supposed to work. From an 
evolutionary perspective, these experiments may well help strengthen HTML’s 
gene pool. But the inconsistencies threaten the Web's long-term integrity; 
eliminating them is one of the main targets of the HTML 2.0 specification, 
described below. 


The messy, loose way things get done on the Internet often works better 
than well-organized, concerted efforts. For example, attempts to extend 
SGML with multimedia, hyperlinks and prescriptive markup have met with 
limited success. Two such efforts, HyTime and DSSSL, § are not for the 
faint-hearted. They are criticized as too complex. Few commercial soft- 
ware tools are available for either standard. It is possible that HyTime 
and DSSSL offer functionality that will be more explicitly necessary 
several years from now. 


In general, the Web offers more reach, less control and less structure than 
dedicated electronic-publishing, text-management or hypertext systems such 

as EBT’s DynaText and Interleaf’s Active Documents (see Release 1.0, 7-91). 
Internet tools are always more primitive than their task-specific counter- 

parts. They also usually cost far less. The key question developers must 

answer is not merely whether they can achieve (and users can get) satisfac- 
tory levels of functionality, but rather whether the increased reach and 


6 HyTime is related to the Standard Music Description Language and was 
designed for time-sensitive multimedia and hyperlinks. The Document Style 
Semantics and Specification Language (DSSSL; rhymes with "missile") offers 
more control over page appearance. 
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malleability of Internet tools outweighs the specific functionality of 
other tools. 


Version control 


Many variants of HTML run on the Web. Some home pages will display on Mac- 
Web, a Mac client, but not on Mosaic, which runs mostly on Windows and X. 
HTML’s shepherds think of the varied stuff in use now (except for forms) as 
HTML 1.0. HTML 2.0 brings the spec in line with current practice (with 
forms). Dan Connolly of HaL Computer Systems in Austin leads that effort; 
he works closely with Berners-Lee. Early on, Connolly developed a reputa- 
tion as the "SGML Cop." He expects HTML 2.0 to be a published specifica- 
tion by the Internet Engineering Task Force (IETF) meeting this December. 


The World Wide Web Consortium and SGML Open 


Berners-Lee coordinated HTML’s evolution alone for several years at 
CERN. Early this year, it became obvious to him that this job would 
require more people. However, CERN is funded to do atomic physics 
research, not software development. At the same time, there was a 
strong call from industry for someone to continue playing a coor- 
dinating role for the Web so it wouldn’t lose its interoperability. 
With so much activity, so many different interests at stake and so 
much flexibility, there is a real risk of multiple, conflicting stan- 
dards emerging, with eventual gridlock. 


Accordingly, Berners-Lee left CERN to found the World Wide Web Con- 
sortium, which has centers at CERN and MIT. It has collaborations 
with NCSA, INRIA in France, Spyglass and Enterprise Integration Tech- 
nologies. The Consortium's goal is to coordinate development of the 
Web and to maintain its interoperability over time, without any major 
discontinuities. MIT made sense to Berners-Lee as a home base partly 
because the center of gravity of Internet research is in the US, and 
partly because MIT has a track record of honorable consortium behav- 
ior with industry to develop broadly useful, public standards such as 
X Windows. 


The Consortium has five staff members at MIT and another five still 
at CERN. Berners-Lee expects to double the staff by next year; first 
he must get additional sponsors and perhaps some funds from the US 
government and the European Community. 


The WWW Consortium has an SGML counterpart called SGML Open, which 
SGML vendors formed in 1991 to promote use of and interoperability 
between SGML products. (Esther Dyson is on SGML Open's advisory 
board.) The two groups actively collaborate to set directions for 
the two standards. SGML Open's chairman Yuri Rubinsky has been a 
major contributor to the HTML 2.0 specification. All of these col- 
laborators make use of Internet mailing lists, net news, Web and 
Gopher servers and other information-sharing tools to ensure open 
discussion. Subscribing to these lists or browsing the information 
on the relevant servers is a great way to follow what is going on 
(see Resources, page 19). 
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The 2.0 specification defines four kinds of elements: a core set that every 
Web application must support; optional features, such as inline images; in- 
teractive form elements for creating more sophisticated Web-based applica- 
tions; and obsolete elements that may be phased out in the future. The 
forms capability is particularly powerful. It allows applications to pres- 
ent custom menus, radio buttons and other features, all of which can feed 
other applications. 


HTML 3.0 (originally called HTML+) is already in the works, led by Dave 
Raggett of Hewlett-Packard’s Bristol Labs in the UK. The 2.0 and 3.0 ef- 
forts are both in the same IETF committee, which also has status as an SGML 
Open Technical Committee. Raggett hopes to show a testbed browser for HTML 
3.0 at the World Wide Web Forum this October in Chicago. He expects to 
have a discussion draft ready this November, then propose 3.0 to the IETF 
as an Internet Draft specification next spring. The testbed browser will 
be rewritten as a public-domain WYSIWYG editor for release during 1995. 


Web priorities 


Some of the proposed changes to HTML are extensions and enhancements that 
will greatly increase its functionality, such as encryption, transaction 
services and database gateways. Others involve increasing the variety of 
SGML tags that HTML supports. For example, HTML 3.0 is likely to have "au- 
thor" tags, so applications can search the Web for pages created by 
specific people, and "keyword" tags, for the creation of indexes. Still 
other proposed changes involve the Web's look and feel. Thtse include 
user-defined toolbars, figures with captions, style sheets, arbitrary in- 
line objects, object graphics, tables and text that flows around figures. 


Luckily, everything doesn’t have to be built into HTML. Raggett and others 
are examining ways to modularize HTML so that special-purpose features such 
as virtual reality extensions and interactive forms are easy to invoke and 
don't weigh down HTML applications. A form could consist of external 


scripts in an arbitrary scripting language. HTML is central -- every home 
page has to use HTML -- but it is not the only language the Web speaks. 
Beyond HTML 


Much of the Web’s power and ability to evolve rests on two Internet stan- 
dards: the HyperText Transport Protocol (HTTP) and the Multipurpose Inter- 
net Messaging Extensions (MIME). (Raggett has proposed an IETF working 
group on HTTP.) HTTP servers negotiate capabilities with client programs 
before they respond to requests. Such requests may include invoking a dif- 
ferent protocol or setting up a parallel session with another server, which 
offers applications far more flexibility. 


For example, at Networld+Interop this month, a company named Ubique pre- 
miered a product called Virtual Places, which offers interactive chat with 
colorful characters -- all inside Sesame, Ubique’s extension to NCSA 
Mosaic. Ubique, which also developed Active Mail (see Release 1.0, 2-94), 
developed a Virtual Trade Show for show producer Ziff-Davis to parallel the 
physical one. Virtual Places runs on an independent server, parallel to 
the HTTP server. The Virtual Places Protocol uses Mosaic as a backdrop, 
and lays a different protocol stream on top that supports Habitat-style, 
real-time chat and interaction (see Release 1.0, 7-93). Cartoon-like 
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characters can move around on the screen, complete with word balloons. If 
both systems support shared voice or video, the users can add it. 


As other interactive approaches evolve, such as the Virtual Reality Markup 
Language we described last month, Ubique will be able to take advantage of 
them. In principle, VPP can run with any client-server data-retrieval sys- 
tem, including relational databases and Lotus Notes. These capabilities 
fit well with a broad underlying trend to move beyond static Web documents 
to distributed collaborations. 


Adobe wants you to print to the Web 


Although many people send PostScript files across the Internet, 
Adobe's flagship product isn’t well suited to interactive Web life. 
However, some of its descendants are. PostScript, born in the world 
of typographers and publishers, offers extreme control over presenta- 
tion, but it’s bulky and linear: To see page 15 of a document, the 
viewing application must generate pages one to 14. 


Adobe's Acrobat generates Portable Document Format (PDF) files which 
offer faithful final-form representation of documents on practically 
any commonly used OS. Any application that can produce PostScript 
can create a PDF file. PDF is page-oriented (if the original docu- 
ment was tabloid size, the viewer must. scroll and zoom around to read 
it), but it offers random. access: through a list of pages and a full 
index that it builds. In addition, PDF is more compact and has book- 
marks, annotations and linking capabilities that aren’t in Post- 
Script. PDF won’t become a predominant format for reusable informa- 
tion, but may be an easy way to share static content that exists in 
other forms. 


Adobe sees Acrobat coexisting with Mosaic and is committed to extend- 
ing PDF’s functionality. It’s possible to enhance PDF by using spe- 
cial tags that different applications ignore. Acrobat 2.0 provides 
for third-party plug-ins. and publicly distributable browsers. That 
way, Adobe will put Web-compatible link capabilities into a future 
version of Acrobat. Avalanche is working to add SGML information. 
Physicists have created a way to publish the existing base of TeX 
documents to PDF. How all this works will be essential: If users 
must manually "hang" links on a PDF document, it is unlikely to be 
popular. If it is transparent and bi-directional, it may do well. 


The surprise candidate that might work well on the Web is Adobe's Il- 
lustrator file format. Illustrator, which is a special version of 
Encapsulated PostScript, would actually behave better than PDF in an 
HTML page. It’s scalable and can be positioned anywhere on a page 
with precision, which makes it a good subordinate citizen. Adobe has 
yet to define an embeddable PDF. 


MIME: a handy rucksack for SGML, Acrobat and OpenDoc 


MIME allows Internet applications to handle arbitrary file types or proto- 
cols, including executable scripts or applications. One of its early uses 
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is for attaching rich-format documents to e-mail inside the Simple Mail 
Transfer Protocol (see Release 1.0, 2-94). Now SGML and Adobe’s Acrobat 
will take advantage of MIME to join the Web; 7 companies are likely to do 
the same with OpenDoc soon. 


Ed Levinson of Accurate Information Systems in Eatontown, NJ, has proposed 
a pragmatic way for SGML to coexist with the Web. His IETF draft specifi- 
cation shows how MIME can carry SGML documents. It shows how to automati- 
cally resolve packaging issues and file references so the receiving party 
doesn’t have to edit all the internal SGML references. 


An ocean of documents at hand 


Spyglass recently announced that its Enhanced NCSA Mosaic will automatical- 
ly launch Acrobat Exchange when appropriate to display Portable Document 
Format files (PDF; see box, previous page). It will also launch an SGML 
viewer from SoftQuad. With these two additions, Mosaic could access any 
document that prints to PostScript (if it were processed to PDF and put on 
the Web), as well as any document now in SGML. If a few of the large SGML 
document archives make their content available, the number of Web-accessi- 
ble documents could double overnight. 


Spyglass is evaluating whether to include these and other viewers with its 
Mosaic. The company has recently taken over management and licensing of 
commercial versions of Mosaic from the National Center for Supercomputing 
Applications (NCSA), which developed the program. NCSA still offers a 
public-with-copyright version of the program which is not for resale. NCSA 
will distribute the SoftQuad browser with its non-commercial Mosaic. 


In Release 1.0, 5-94, we described how a combination of OpenDoc and the In- 
ternet could marry the best features of the Internet with best features of 
desktop computing. MIME is an easy way to begin this process. In princi- 
ple, a Mosaic browser could open an OpenDoc viewer as a MIME type. 


A new lease on life for existing markets 


The recent Seybold electronic-publishing show saw plenty of product intro- 
ductions; more are on their way at the Web conference in October (see 
Resources, page 19). Companies with content already in SGML should be ex- 
cited about the attention the Web has gotten. They stand to find broad new 
channels through which to distribute their content. The SGML-oriented 
software market is also likely to grow because it makes sense for many doc- 
uments. to. be brought into. full SGML compliance. Other markets may benefit 
from the interplay between SGML and HTML. Here are a few examples. 


@ Finally, here's wide-area hypertext that works! Previous attempts to 
link content in hypertext, electronic-book and personal-information- 


7 The World Wide Web doesn’t take full advantage of MIME. It only uses 
top-level typing (similar to the way Windows matches file extensions with 
applications in WIN.INI), not more powerful features such as multi-part 
parallel contents. In principle, HTTP could send a page that contains six 
thumbnail images as one package, instead of requiring seven separate inter- 
actions to retrieve them. In fact, MIME could use OpenDoc’s Bento. 
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management systems were constrained within the boundaries of proprie- 
tary databases, naming schemes or file systems. The Web offers ex- 
tensible link space. 


e Groupware has likewise lacked a platform that is extensible, malleable 
and broadly used. Good feature ideas were mired in awkwardly designed 
products that didn’t work with common desktop applications. Component 
software and the World Wide Web promise to revitalize this market. 


e The dynamic document assembly market finally has an infrastructure. 
Imagine a viewer that can retrieve chunks from several places and flow 
them together into one document, instead of putting each into a sepa- 
rate window, which is an artificial distinction. 


e All of these applications will benefit from smarter automatic "page" 
layout of the sort that Pages Software and Interleaf have developed 
over the past few years (see Release 1.0, 7-91). Similarly, OpenDoc 
parts do some negotiation for display space. 


The all-important <A>nchor 


From the complex, hierarchical-document SGML perspective, HTML’s anchor tag 
<A>, which holds the hypertext links, is no more unusual a markup element 
than those for tables or sidebars, However, the anchor tag is probably 
HIML'’s most important contribution to the information revolution. The easy 
linking of HTML draws attention to the relationships between document 
chunks. The anchor is not ancillary; it is essential: It is a document’s 
link(s) to the outside. 


The anchor may change what we pay attention to in communication. For exam- 
ple, in an e-mail message, what communicates an idea more powerfully: 
Making sure the inset box is in the right font and in the right place, or 
linking it to six relevant documents across the world? Putting content in 
context may be more compelling than making it look pretty. Editors, 
analysts and colleagues may help put our content in context. Of course, 
there’s a downside: A document with 100 links can be overwhelming. 


Virtual structure: The table of contents as a point of view 


Anchors may help us deconstruct huge documents into more useful components, 
without totally losing their structure. With the exception of the Pin- 
nacles DID, the major DTDs used in the SGML world are highly hierarchical 
and linear. But those hierarchies are more arbitrary than their creators 
would let on. It’s one thing to agree within an industry to a particular 
sequence and depth of topics; it’s quite another to generate a large docu- 
ment, count the number of nested subheads, then declare that the document 
"has" six levels, or however many exist. 


8 By contrast, HyTime adds links to SGML in such a complex and cumbersome 
way as to be unusable to novices. As HTML and URLs disappear behind 
graphical authoring tools, they can get more complex; however, access by 
ordinary people is likely to remain important for quite a while. 
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One person’s footnote is another person’s thesis statement. What seems im- 
portant today may seem trivial tomorrow or needs to be updated for all the 
new things that point to it. With easy, ubiquitous links, one can create 
virtual structure by creating a table of contents that offers a reader a 
particular path. The same text objects can be used in many different se- 
quences for many different purposes or updated once for all new references. 
It’s not as easy as writing a new table-of-contents page, since the newly 
ordered pages may not make any sense, but the underlying idea is powerful. 
The structure is (part of) the content. 


HTML and the Web offer a new mode of expression where documents blend into 
the document medium and where links are potentially as important as the 
nodes they connect. It could lead to the deconstruction of many large, 
structured documents or, more significantly, to a collaborative balance be- 
tween large, structured documents and ones built from smaller components. 


"Creating hypertext is different from creating struc- 
tured linear text. Once you’ve done it, you long for 
ies” 

-- Tim Berners-Lee, WWW Consortium 


Slowly we shift 


The changes in the way we view and use information could really complicate 
content owners’ lives. Content owners will have to collect payments and 
manage documents and links in an increasingly complex envfronment. Content 
authors will have to rethink how chunks are composed and flowed into each 
other. Authors are not likely to know which chunk people have just read, 
or where they will be two links later. 


This may be word-processing vendors’ nightmare come true: The urge to 
share information and link it to other, relevant chunks may dominate other 
urges and force a transfer of content into highly exchangeable forms, in- 
stead of the proprietary formats that each word processor built and now 
defends. Word processing has hit the wall for new features and hasn't even 
implemented many features well. Look at how much time people waste jigger- 


ing things to set up an outdent -- or failing to do so and using spaces and 
tabs instead. 


As we at Release 1.0 move this newsletter toward multiple forms of elec- 
tronic distribution, we face the same questions: What tool should we au- 
thor in? A word processor? A Web document editor? An SGML tool? A page- 
layout program? The following section presents some of the early options. 


SGML affected text the way the first databases affected 
data. The Web is like a second-generation database: 
Think of it as the normalization of documents, the way 
information architects normalized data in the shift from 
the hierarchical to the relational database. 
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LIGHTWEIGHT AUTHORING TOOLS AND CONVERTERS 


There is already a small but growing market of SGML tools. This section 
focuses on SGML and other tools that people can use on the World Wide Web. 


Despite the sophisticated NextStep system that Berners-Lee created to 
launch the Web, most people use tools that are still primitive. The need 
for better tools has not gone unnoticed. At first, people wrote shareware 
tools, including macros, filters, Perl scripts, Common Lisp programs and 
HyperCard stacks, to complement the public-domain browsers such as Cello, 
MacWeb and Mosaic. An XTension created at MIT converts files from the 
QuarkXPress desktop-publishing program to HTML; researchers at CERN created 
WebMaker, a FrameMaker-to-HTML converter. 


Now SGML, word-processing and Internet software vendors have entered the 
fray. Some of them collaborated to get to market quickly; some scaled 
their SGML authoring packages down in the hope that buyers would upgrade. 
It's possible that the many pieces that are now available may confuse 
potential buyers. This is more complicated than a decision between Ami 
Pro, Word, WordPerfect or XyWrite. Nevertheless, this is a hot category: 
Many new products premiered at the recent Seybold electronic publishing 
show. Here’s a sampling. Some of these companies were described in 
Release 1.0, 7-91. 


Microsoft SGML Author for Word 


At the Seybold show this month, Microsoft announced a product that will al- 
low people to use Word to generate SGML documents. The product, Microsoft 

SGML Author for Word, is for companies that already use SGML, or would like 
to, and want make SGML available to the large number of people who already 

use Word. It can also generate valid HTML. Microsoft expects to ship SGML 
Author, which it created with Avalanche Development, by the end of 1994. 


SGML Author consists of an administration tool, a converter and sample ap- 
plications for various DTDs, including a simple DTD for people who are un- 
familiar with SGML. Administrators map existing corporate DTDs to Word 
styles, which they distribute to end-users. Administrators may need other 
tools to create the DTDs, such as MicroStar’s Near & Far. 


Document authors create documents in Word 6.0 with those style guides or 
with specific templates. Then they "Save As..." to SGML format, a new op- 
tion on their menus. Doing so invokes a post-processor, which validates 
the use of the SGML tags, corrects SGML errors and annotates the file with 
its changes. Microsoft expects most people to stay within Word through the 
entire cycle. People who want to work in native SGML can use SoftQuad 
Enactor, a specialized version of SoftQuad’s Author/Editor integrated to 
work with and look more like Word 6.0. 


Although it’s not fully interactive, integrating SGML with Word styles is a 
minimally disruptive approach to broad corporate use of SGML. By the way, 
Lf you've been counting so far, Word users who want to publish on the Web 
should soon be able to either (a) edit a raw HTML file, (b) print to 
PostScript and create PDF files, (c) generate SGML using SGML Author that 
is interpreted as HTML or (d) generate HTML directly using SGML Author. 

And that’s not counting independently created Word macro scripts. 
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Drop an Avalanche on your legacy documents 


Avalanche Development, a subsidiary of Interleaf, is best known for its 
SGML parsing program FastTAG, which scans text files, attempts to deduce 
their structure and leaves behind the appropriate SGML tags. Recently, 
Avalanche announced SureSTYLE, a companion product to Microsoft SGML Author 
for Word. Think of SureSTYLE as an SGML grammar checker for SGML Author. 
SureSTYLE uses Avalanche’s visual recognition technology to make document 
styles consistent with those created for SGML Author. Microsoft will dis- 
tribute the base version with SGML Author. The advanced developer's ver- 
sion, SureSTYLE Pro, is more configurable and includes workgroup features. 


Implementing SGML across an enterprise takes considerable expertise. 
Through its new Author Assistant program, Avalanche will help companies 
pick product suites, customize them, design templates, create distribution 
and publishing strategies, train users on-site and do product support. 


SoftQuad’s HoTMetaL: Doom for documents 


Last June, Toronto-based SoftQuad, an SGML pioneer, shipped HoTMetaL, a 
product that should quickly make the company visible on the Web. Like 
SoftQuad Enabler (for Microsoft’s SGML AUthor), HoTMetaL is a slimmed-down 
version of SoftQuad’s Author/Editor. HoTMetaL is designed to generate 
HTML, not SGML. Versions for Sun and Windows are available free on the In- 
ternet, Doom-style (see Resources, page 19); a Mac version is on its way. 
HoTMetaL complements Mosaic: Menu selections in HoTMetaL and the new En- 
hanced NCSA Mosaic allow users to switch back and forth, so they can pre- 
view in Mosaic the files they create in HoTMetaL. 


HoTMetaL Pro costs $195 and adds WYSIWYG table editing, spell-checking, 
multi-page HTML printing, user-defined macros and long or short sets of 
menus. SoftQuad’s flagship product, Author/Editor, helps people deliver 
SGML documents; HoTMetaL is designed to let people deliver HTML documents 
that others can browse with Mosaic and other Web browsers. 


EIT’s authoring tools 


SoftQuad is not the only company working on authoring tools to complement 
Mosaic. Enterprise Integration Technologies (EIT), the company behind Com- 
merceNet, is working on a suite of tools for Motif that includes an HTML 
structure editor, a WYSIWYG HTML document editor, a media browser and a 
link browser (for a pointer to a description, see Resources, page 19). 


EIT’s developers use a Motif widget construction kit called Winterp (the 
OSF/Motif Widget INTERPreter), which allows EIT to get new features as the 
widgets are improved. Although Winterp allows for rapid prototyping and is 
extremely portable between Unix systems, it doesn’t work on other OSs. 


Nevertheless, EIT’s developers are creating some extremely useful features. 
For example, browser “hotlists" are usually a chronological or alphabetical 
list of the places you've decided to revisit. EIT’s designers are making 
hotlist graphical, so that a user can create a personalized map of the vir- 
tual terrain. EIT’s link browser similarly allows users to view portions 
of the World Wide Web as a directed graph, with version control and check- 
in. EIT is also working on HTTP server software. 


Release 1.0 27 September 1994 


5 tet ESR SE DY RES NOFA AV Tac onan 


16 
SERVERS AT YOUR SERVICE 


As Web usage skyrockets, the need for better servers becomes obvious. The 
requirements are similar to other client-server systems: fault tolerance, 
high availability, scalability and wide-area load balancing. In addition, 
servers can be enhanced with many features, including text-retrieval and 
document management. This section examines some such products. 


Mosaic Communications does a rewrite 


Jim Clark, co-founder of Silicon Graphics, knows a business opportunity 
when he sees one. Last spring, as the Web mushroomed and Mosaic was on 
everyone's lips, he whisked away most of the programmers who had created 
NCSA Mosaic at the University of Illinois Urbana/Champaign. With them, he 
founded Mosaic Communications (MCom). The development team, led by Marc 
Andreessen, is about to release its own versions of the Web client and 
server applications. 


Most companies creating browsers for the Web are licensing the original, 
cleaned-up Mosaic code from Spyglass, the official master licenser for com- 
mercial versions. MCom wanted clear rights to and control over the code. 
The team also knew the shortcomings of what it had done on its first, 
quick-and-dirty attempt, so it decided to rewrite the client and server 
software from scratch. The resulting applications, Mosaic NetScape and 
Mosaic NetSite, offer equivalent functionality to the original Mosaic, with 
more robust code and some new features. 


NetScape is optimized for 14.4 Kbps modem connections, runs faster than the 
original Mosaic and has a common feature set on Windows, Mac and X Windows. 
It can display results before the entire file arrives, handle multiple re- 

quests in parallel, encrypt files and authenticate servers. 


The server software has two Unix-based versions: The NetSite Communica- 
tions Server ($5000; $1500 for the rest of 1994) adds scalability, secript- 
based setup and security to the current HTTP server software. It also 
handles multiple concurrent sessions efficiently. The NetSite Commerce 
Server ($25,000) is for secure commerce. It includes RSA encryption and 
authentication. The servers are scheduled to be available this October and 
November, respectively. 


EIT offers a Web kit, IDI breaks things up and Individual adds markup 


EIT has released some software that can help organizations set up their own 
Web sites called the Webmaster’s starter kit. The kit helps users set up 
and configure a Web server and includes code to customize home pages, con- 
vert mail archives to HTML (a nifty utility called HyperMail), generate 
server statistics, verify links and more (see Resources, page 19). 


Information Dimensions Inc. (IDI), a long-time vendor of large-scale text- 
retrieval systems, announced at Seybold a product that takes documents in 
IDI’s BasisPlus text-retrieval system and presents them as HTML documents. 
A follow-on version will interpret SGML documents to HTML. 


Individual Inc., which creates and distributes the First! custom-filtered 
news service, is shipping First! for Mosaic, which adds HTML tags to the 


Release 1.0 27 September 1994 


17 


current stream of stories. With the tags, readers can get full-text arti- 
cles and search for information such as topic, date, company or product 
names. SGML tags are also available. Companies will likely leverage the 
markup by using text-retrieval, categorization or other custom programs to 
automatically link the news stories to documents in their own archives and 
databases. The First! service is also available via fax and e-mail, and as 
a Notes database. 


EBT’s DynaWeb 


Electronic Book Technologies’ flagship product is DynaText, an electronic 
"book" authoring package for SGML first introduced in 1990. DynaText ac- 
cepts any valid SGML document and automatically builds an electronic book 
that can include hyperlinks, tables, equations and graphics. It runs on 
Windows, Macintosh and Unix and includes an integrated indexer, stylesheet 
editor and browser. EBT's most recent products are DynaTag, a tool that 
converts proprietary word-processing documents into DynaText, and DynaBase, 
a native SGML document-management database implemented atop Object Design’s 
ObjectStore. 


This month, EBT expects to field a high-function, high-end product for com- 
panies that have many SGML documents and want to make them available on the 
World Wide Web. The product, called DynaWeb (the DynaText Wide-area Elec- 
tronic Bookserver), is an Internet document server that interprets SGML 
documents and makes them look like HTML documents. It accepts queries from 
Web browsers, searches the SGML database and converts the requested docu- 
ment to HTML on the fly. Its key selling features are that it allows pub- 
lishers to make their large information repositories available on the In- 
ternet, and it automatically segments SGML documents into pieces of ap- 
propriate size for Web browsers. DynaWeb will eventually be bundled with 
the DynaText tool, which costs between $10,000 and $150,000 (ten to an un- 
limited number of users). 


Notes as a Web server 


The Internet is both a threat to and an opportunity for Notes. On the one 
hand, many corporations see Gopher, e-mail and the Web as alternative ways 
to create shared document databases or discussion groups. On the other 
hand, even cursory inspection of the available Internet tools makes clear 
that robust and secure document-management systems can really help com- 
panies that want to use the Internet to communicate and publish. Lotus 
Notes is one such robust system, and it has grown steadily from work-group 
to enterprise use and now, with AT&T Network Notes, to public networks. An 
increasing number of Notes users already use the Internet as a transport 
service: They replicate Notes databases and connect Notes client software 
to document databases over the Internet. 


Now, under a project named InterNotes, Lotus will add features to Notes so 
it connects more effectively with the Internet in both directions: as a way 
for Notes users to access the Internet alongside their usual Notes applica- 
tions and databases, and as a way for Internet citizens to access informa- 
tion stored in Notes databases that companies want to make available over 
the Internet. 


Over time, Notes users will see increased functionality that allows them to 
browse Internet documents with their existing client software. For the 
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Web, Lotus will offer a Mosaic proxy that runs on a Notes server. The 
Mosaic proxy will dynamically translate Web documents into Notes documents, 
complete with DocLinks where embedded Web tags are. 


To make information available to people across the Internet, Lotus will de- 
velop an HTTP module that runs next to Notes servers. To make any docu- 
ments stored in a Notes database look like proper Web documents, the module 
will map Web queries to Notes commands, filter or translate documents, if 
necessary, and transmit the responses. In principle, Notes can be a great 
repository for information that companies want to make available selective- 
ly across the Internet. Notes offers features such as access control, ap- 
plication APIs, third-party software and replication services. Also, work 
groups can use the Notes platform to collaborate as they create and 
maintain Web documents. 


The Internet, with pricing policies that often include flat-rate access 
anywhere around the world and transmission speeds well above a fast modem, 
is affecting the way organizations distribute information. After all, why 
replicate a database when the original dataset is available almost in- 
stantaneously? 


Notes was designed for a pessimistic communications scenario: an environ- 
ment in which wide-area communications are slow and expensive, with users 
who are not connected to the network all the time. In contrast, the Inter- 
net assumes full-time connections. Recent Internet dialup facilities such 
as the Serial Line Internet Protocol (SLIP) allow host computers to move 
about, but don’t help distribute or cache information; applications still 
have to do that. Also, Web protocols are efficient at setting up sessions 
and transferring information, but they're not well adapted to load balanc- 
ing when traffic is heavy. 


Since it is designed for a worst-case world, Notes has some capabilities 
that complement the Internet’s own. An obvious one is Notes’ replication 
feature, rethought for mixed Internet and offline access. Or imagine Notes 
client software enhanced with intelligently cached Web pages, so that a 
user can capture a meaningfully linked set of HTML pages and work with them 
offline. The Internet and Notes will co-evolve. 


COMING SOON 


Smart communications software. 
Advertising online. 
Software for education. 


Intelligent drawing and authoring tools. 
And much more... (If you know of any 
good examples of the categories listed 
above, please let us know.) 
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RESOURCES & PHONE NUMBERS 


Ed Levinson, Accurate Information Systems, (908) 389-5550; 
<elevinso@accurate.com> 

James C. King, Adobe Systems, (415) 962-4944; fax, (415) 962-6063; 
<jking@adobe.com> 

Jed Harris, Component Integration Lab, (408) 974-6549; fax, (408) 974-9710; 
<jed@cil.org> 

Lou Reynolds, Electronic Book Technologies (EBT), (401) 421-9550; fax, (401) 
421-9551; <lrr@ebt.com> 

Jay Glicksman, Enterprise Integration Technologies (EIT), (415) 616-8000; 
fax, (415) 617-8019; <jay@eit.com> 

Dan Connolly, HaL Computer Systems, (512) 834-9962 x5010; fax, (512) 834- 
9963; <connolly@hal.com> 

Dave Raggett, Hewlett-Packard, 44 (272) 228-046; fax, 44 (272) 228-003; 
<dsr@hplb.hpl .hp. com> 

Bill Young, Information Dimensions, Inc. (IDI), (614) 761-7299; fax, (614) 
761-7290 

Yosi Amram, Individual, Inc., (617) 354-2230; fax, (617) 354-6210; 
<yosi@individual .com 

Charles Goldfarb, Information Management Consulting, (408) 867-5553; fax, 
(408) 867-1805; <gml@almaden. ibm. com> 

Haviland Wright, Interleaf, (617) 290-4990 *1714; fax, (617) 290-4981; 
<haviland@ileaf.com> 

Paul Haverstock, Lotus, (617) 693-4264: fax, (617) 693-5541; 
<paul _haverstock@crd.lotus.com> 

Chris Locke, MecklerWeb, (193) 226-6967; fax, (203) 454-5840; 
<clocke@panix.com> 

John Vail, Microsoft, (206) 936-7407; fax, (206) 936-7329; 
<johnva@microsoft.com> 

Jim Clark, Marc Andreessen, Mosaic Gommunications, (415) 254-1900; fax, 
(414) 254-2601; <jim@mcom.com>, <marca@mcom. com> 

Eric Naggum, Naggum Software, 47 2295-0313; <erik@naggum.no> 

Bruce Webster, Pages, (619) 492-9050 x212; fax, (619) 492-9124; <bweb- 
ster@pages.com> 

Dan Pattyn, Sematech, (512) 356-3868: fax, (512) 356-3575; 
<dan.pattyn@sematech.com> 

Mary Fletcher Laplante, SGML Open, (412) 264-4258; fax, (412) 264-6598; 
<laplante@sgmlopen.com> 

Yuri Rubinsky, SoftQuad, (416) 239-4801; fax, (416) 239-7105; <yuri@sq.com> 

Tim Krauskopf, Spyglass, (217) 355-6000; fax, (217) 355-8925 
<timk@spyglass.com> 

Udi Shapiro, Ubique, (415) 896-2434; fax, (415) 541-7775; <udi@ubique.com> 

Tim Berners-Lee, WWW Consortium, (617) 253-9670; fax, (617) 258-8682: 
<timb1@w3.org> 
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For further reading: 
Visit the Text Encoding Initiative at http://etext.virginia.edu/TEI. html 


HTML-WG (working group) is a mailing list focused on getting HTML 2.0 fin- 
ished. Send message body "help" to majordomo@oclc.org; it will reply 
with a list of lists it hosts, as well as subscription instructions. 


For help with Web development questions, hit the World Wide Web Virtual Li- 
brary, at http://www.charm.net/web/Vlib. html 


Dan Connolly’s HTML Design Notebook points to many other resources, includ- 
ing official definitions of current and proposed HTML specifications. 
It's at http: //www.hal.com/%7Econnolly/drafts/html-design. html 


Get SoftQuad’s HoTMetaL via ftp from several sites, including 
ftp.ncsa.uius.edu: /Mosaic/contrib/Softquad 


The remote HTML validation service Connolly has made available is at 
http: //www.hal.com/users/connolly/html-test/service/validation- 
form. html 


Find EIT's Webmaster starter kit at http://wsk.eit.com/wsk/doc/ 


EIT’s document "Internet Publishing Via the World Wide Web" is at 
http: //www.eit.com/papers/gpware94/paper.html 


For an excellent explanation of the different kinds of markup, read "Markup 
systems and the future of scholarly text processing," by James H. 
Coombs, Allen H. Renear and Steven J. DeRose, published in Communica- 
tions of the ACM, November 1987. 


Release 1.0 is published monthly, except for a combined July/August issue, 
by EDventure Holdings, 104 Fifth Ave., New York, NY 10011-6901; (212) 924- 
8800; fax, (212) 924-0240. It covers PCs, software, computer-telephone in- 
tegration, groupware, text management, connectivity, messaging, wireless 
communications, artificial intelligence, intellectual property law and other 
unpredictable topics. A companion publication, Rel-EAST, is an information 
bulletin on emerging technology markets in Central Europe and the former 
Soviet units. Editor: Esther Dyson <esther@edventure.com>; publisher: 
Daphne Kis <daphne@edventure.com>; managing editor: Jerry Michalski 
<jerry@edventure.com>; circulation & fulfillment manager: Robyn Sturm 
<robyn@edventure.com>; executive assistant: Christina Koukkos; editorial & 
marketing communications consultant: William M. Kutik <kutik@edventure.com>. 
Copyright 1994, EDventure Holdings Inc. All rights reserved. No material 
in this publication may be reproduced without written permission; however, 
we gladly arrange for reprints or bulk purchases. Subscriptions cost $595 
per year, $650 overseas. 
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~ RELEASE 1.0 CALENDAR 


October 2-5 SPA fall symposium - Boca Raton. Sponsored by Software Pub- 
lishers Association. Call Karen Johnson, (202) 452-1600; fax, 
(202) 223-8756. 

October 3-5 Power '94 - Santa Clara, CA. Sponsors: BIS Strategic Deci- 
sions and Arthur D. Little. Call Marlene Nusbaum, (800) 874- 
9980; fax, (617) 878-6650. 


October 3-7 Software Development '94 East - Washington, DC. Sponsor: Mil- 
ler Freeman. Call Jessica Powers (415) 905-4928; fax, (415) 
905-2222. 

October 4-6 International Software Industry Trade Show - Washington, DC. 
Co-located with Software Development '94 (see above). 

October 4-5 Managing the Privacy9 Revolution - Washington, DG. Sponsor: 


Privacy and American Business. Call Lucy Vidal, (201) 996- 
1154; (201) 996-1883. 


October 4-6 UNIX Expo - New York City. Sponsor: Bruno Blenheim. Call An- 
nie Scully, (201) 346-1400, x145; fax, (201) 346-1602. 
October 5-7 CD-ROM Expo & Conference - Boston. Sponsors: Digital Video 


Magazine, GD-ROM Professional and MacWorld Magazine. Call 
David Eliot, (800) 945-3318; fax, (617) 361-9074. 


October 5-8 Software Publishers Association - Dallas. Sponsor: SPA. Call 
Nadia Kader, (202) 452-1600, x339; fax, (202) 785-3649. 
October 6-8 *Euro Channels ‘94 - Paris. For distributors and vendors, 


with panels by Esther Dyson. Sponsor: Global Touch. Call 
Denise Sangster, (510) 601-7573; fax, (510) 601-5639. 

October 8-9 1994 Annual Meeting of Computer Professionals for Social Re- 
sponsibility - UC San Diego. Sponsor: CPSR. Call Susan Evoy, 
(415) 322-3778; fax, (415) 322-4748. 


October 10-12 Technology Seminar - Baltimore. Sponsor: Alex. Brown & Sons. 
Call Kimberly Lynne, (410) 783-3240; fax, (410) 783-3058. 
October 11-13 Intelligent multimedia information retrieval systems & man- 


agement - New York City. Sponsors: Centre De Hautes Etudes 
Internationales D'Informatique Documentaire France and Center 
for Advanced Study of Information Systems USA. Contact: J.M. 
Brentano, 33 (1) 42 85 04 75; fax, 33 (1) 48 78 49 61; in US 
Peter Brodnitz, phone/fax (212) 741-1421. 

October 11-13 MultiMedia Expo - San Francisco, GA. Sponsors: American Ex- 
positions, MDG, & TICS. (212) 226-4141; fax, (212) 226-4983. 

October 12-14 XEMA Messaging Leadership Conference - Washington, DC. 
Sponsor: EMA. Esther Dyson to speak at lunch Friday. Call 
Megan Spillane, (703) 524-5550; fax, (703) 524-5558. 

October 13-14 Global Mobile - Paris. Sponsor: Tellabs. Call Denys Gilhooly, 
33 (1) 49 52 33 00; fax, 33 (1) 49 52 07 56. 

October 17-18 Hollywood 2000 - Los Angeles. Sponsors: Video Store Magazine, 
PC Graphics & Video, and Response TV. Call Trisha Allen, 
(800) 854-3112; fax, (714) 513-8481. 

October 17-19 Microprocessor Forum - San Francisco, CA. The PC Forum of the 
microprocessor community. Sponsor: MicroDesign Resources. 
Gall Leslie Hunziker, (707) 824-4006; fax, (707) 823-0504. 
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World Wide Web Conference - Chicago. Where Web directions are 
set. Sold out. Sponsor: WWW Consortium. E-mail well@osf.org 
or try http://riwww.osf.org:8001/ri/announcements/WWW_Conf_ 
F94. html. 

Defining the Electronic Consumer III - New York City. Pin- 
point those elusive consumers! Sponsor: Jupiter Communica- 
tions. Call David Schwartz, (212) 941-9252; fax, (212) 941- 
7376. 

PC Expo - Chicago. Sponsored by Bruno Blenheim. Call Annie 
Scully, (201) 346-1400; fax, (201) 346-1602. 

The Next Economy - An Evolving Information Ecosystem - San 
Francisco, CA. Sponsor: Bionomics Institute. Call Beth Wein- 
rich, (415) 454-1000; fax, (415) 454-7460. 

@cscW °94 - Chapel Hill, NC. Sponsor: ACM. With Jerry 
Michalski. Call Kevin Jeffay, (919) 962-1938; fax, (919) 
962-1799. 

OOPSLA 794 - Portland. Objects in all their morphisms. 
Sponsor: AGM. Call Steve Poltrack, (206) 865-3270. 

Fall membership meeting, Massachussetts Software Council - 
Newton, MA. Sponsor: Massachussetts Software Council. Fax 
(617) 437-9686. 

INTEROP 94 - Paris. The mother of all networking conferences. 
Sponsor: Interop Europe. Contact: Carinne Propper, 33 (1) 
4639-5656; fax, 33 (1) 4639-5699. 

Wireless Data "94 - San Francisco, CA. Sponsors: Wireless 
Magazine & Datacomm Research. Contact: Frank Rimler, (201) 
285-1500; fax, (201) 285-1519, 

Assets '94 - Marina del Rey. Sponsors: ACM and GIGCAPH. Con- 
tact: Ephraim Glinert, Dept. of Computer Science, RPI, Troy, 
NY 12180 or glinert@cs.rpi.edu. 

PDA Industry Forum - San Jose. A conference for users, pro- 
grammers & manufacturers. Call Jon Covington, (415) 252-8008; 
fax, (415) 252-8055. 

*The hackers conference - North Lake Tahoe. The tenth annual! 
Sponsors: Microsoft, Fantasia Systems, Point Foundation and 
others. Call Glenn Tenney, (415) 574-3420; fax, (415) 574- 
0546. 

Electronic document systems conference & exhibit - Phoenix. 
Sponsored by Xplor International. Call Anne Davison, (310) 
373-3633; fax, (310) 375-4240. 

@BusinessNet - New York City. Sponsored by CMP Publications 
and New Media Associates. Covers online activity, from ad- 
vertising to virtual communities. With panel moderated by 
Jerry Michalski. Call Irene McCarty, (516) 733-6740; fax, 
(516) 733-6753. 

Technology 2004 - Washington, DC. Sponsors: NASA and the 
Technology Utilization Foundation. Call Wendy Janiel, (212) 
490-3999; fax, (212) 986-7864. 

@Advertising Day - New York City. Sponsor: Center for Commu- 
nication. With "commercial netiquette" panel moderated by 
Jerry Michalski. Call Laura Blum, (212) 836-3050; fax, (212) 
836-2773. 

Advanced Distributed Simulation Conference - Washington, DC. 
Sponsor: The Technical Society of America. Call Dana Marcus, 
(310) 534-3922; fax, (310) 534-0743. 
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*Comdex - Las Vegas. The biggest US show of all. Sponsored by 
the Interface Group. Call Peter Young, (617) 449-6600; fax, 
(617) 449-6953. 

The Art of Software Design - San Jose. With Alan Cooper, in- 
ventor of Visual Basic. Sponsored by the Association of Soft- 
ware Design. Call Cynthia Lewis, (510) 841-5808; fax, (510) 
848-4721. 

IT Services '94 - Santa Clara. Sponsor: Creative Expos and 
Conferences. Call Cherif Moujabber, (508) 660-7099; fax, 
(508) 668-2416. 

Internet World '94 - Washington, DC. Sponsor: Mecklermedia. 
Call Milissa Brigante, (203) 226-6967; fax, (203) 454-5840. 


1995 


@SoftExpo95 - San Jose. Sponsored by Software Publisher Maga- 
zine. With Jerry Michalski. Call David Webster, (303) 745- 
5711; fax, (303) 745-5712. 

ExpoComm Mexico 95 - Mexico. Sponsors: TELMEX, TIA, & IEEE. 
Gall Anna Simmons, (301) 986-7800: fax, (301) 986-4538. 

*Demo 95 - Palm Springs. Stewart and David’s picks. Sponsored 
by InfoWorld Editorial Products. Call Therese Solimeno, (415) 
312-0545; fax, (415) 312-0547. 

*@Two BBSCON - Dusseldorf. Sponsored by Two BBSCON. Learn 
about bulletin-boards and online services in Europe. With 
Esther Dyson and Jerry Michalski. Call Philipp Ziegler or 
Corinne Jost, 41 (75) 373 28 32; fax, 41 (75) 373 30 62. 
Networks Expo - Boston. Sponsored by Bruno Blenheim. Call An- 
nie Scully, (201) 346-1400; fax, (201) 346-1602. 

Digital Hollywood - Beverly Hills. Sponsored by American Ex- 
positions. Call (212) 226-4141. 

**PC Forum - Phoenix. Sponsored by us: You read the newslet- 
ter; now meet the players. Call Daphne Kis, (212) 924-8800; 
fax, (212) 924-0240; daphne@edventure.com. 

Documation '95 - Long Beach, CA. Co-sponsored by PTM, GCA, 
The Gilbane Report and GCARI. Call Frank Gilbane, (617) 576- 
5700; fax, (617) 576-5708, or Marion Elledge, (703) 519-8160; 
fax, (703) 548-2867. 

WINLAB Workshop - East Brunswick, NJ. Sponsor: WINLAB. Meet 
the experts in wireless networks. Call Melissa Gelfman, (908) 
445-0283; fax, (908) 445-3693. 

CHI "95: Mosaic of Creativity - Denver. Sponsored by ACM. 
Call Rosemary Wick Stevens, (415) 328-3600. 

@EMA "95 - New Orleans. Sponsored by Electronic Messaging As- 
sociation. Call Heather Burneson, (703) 524-5550; fax, (703) 
524-5558. 

@CES Interactive - Philadelphia. Sponsored by Electronic In- 
dustries Association’s Consumer Electronics Group. Call 
Cynthia Upson, (202) 457-8728; fax, (202) 833-7370. 


* Events Esther plans to attend. 
@ Events Jerry plans to attend. 


Lack of a symbol is no indication of lack of merit. 
Please let us know about other events we should include. -- Christina Koukkos 
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